Off-Target Tracking Using Feature Aiding in the Context of Inertial Navigation

ABSTRACT

A Visual Inertial Tracker (VIT), such as a Simultaneous Localization And Mapping (SLAM) system based on an Extended Kalman Filter (EKF) framework (EKF-SLAM) can provide drift correction in calculations of a pose (translation and orientation) of a mobile device by obtaining location information regarding a target, obtaining an image of the target, estimating, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, and correcting a pose determination of the mobile device using an EKF, based, at least in part, on the measurements relating to the pose of the mobile device.

This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/883,921, entitled “OFF-TARGET TRACKING USING FEATURE AIDING IN THE CONTEXT OF INERTIAL NAVIGATION,” filed on Sep. 27, 2013, which is hereby incorporated by reference for all purposes as if fully set forth herein.

BACKGROUND

Camera based tracking of the mobile device's rotation and translation can be executed by a mobile device (e.g., mobile phone, tablet, heads-up display, and the like) to enable the mobile device to provide a wide variety of features, such as augmented reality and location tracking. In order to more accurately track mobile device's translation and orientation—also known as the six degrees of freedom, or “pose”—a mobile device may additionally incorporate information from sensors such as gyroscopes, accelerometers, GPS receivers, and the like. However, sensor noise and modeling errors can cause a tracking system to “drift,” resulting in inaccurate pose determinations. These inaccuracies will build on each other, increasing over time, unless measures are taken to correct these pose determinations. Furthermore, GPS reception is poor to non-existent indoors and inertial sensors alone cannot provide absolute pose.

SUMMARY

A Visual Inertial Tracker (VIT), such as a Simultaneous Localization And Mapping (SLAM) system based on an Extended Kalman Filter (EKF) framework (EKF-SLAM) can provide drift correction in calculations of a pose (translation and orientation) of a mobile device.

An example method of correcting drift in a tracking system of a mobile device, according to the disclosure, includes obtaining location information regarding a target, obtaining an image of the target, the image captured by the mobile device, and estimating, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information. The pose comprises information indicative of a translation and orientation of the mobile device. The method further comprises correcting a pose determination of the mobile device using an EKF, based, at least in part, on the measurements relating to the pose of the mobile device.

The example method of correcting drift in a tracking system of a mobile device can include one or more of the following features. The method can include obtaining absolute coordinates of the target, where correcting the pose is further based, at least in part, on the absolute coordinates of the target. The method can include processing the image of the target to determine that the target was captured in the image, where the processing includes comparing one or more keypoints of the image with one or more keypoints of each target in a plurality of known targets. The method can include receiving one or more wireless signals from one or more access points, determining a proximity of the one or more access points based on wireless signals, and determining the plurality of known targets, based on the determined proximity of the one or more access points. The tracking system can incorporate a Simultaneous Localization And Mapping (SLAM) system with the EKF. The pose determination can be based, at least in part, on measurements from one or more of an accelerometer or a gyroscope of the mobile device. The method can include determining a bias of one or more of the accelerometer or the gyroscope of the mobile device, based at least in part on the pose determination and the corrected pose.

An example mobile device, according to the disclosure, can include a camera, a memory, and a processing unit. The processing unit is operatively coupled with the camera and the memory and configured to obtain location information regarding a target, obtain, an image of the target, the image captured by the camera of the mobile device, and estimate, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, where the pose comprises information indicative of a translation and orientation of the mobile device. The processing unit is further configured to correct a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.

The example mobile device can include one or more of the following features. The processing unit can be configured to obtain absolute coordinates of the target and further configured to correct the pose is based, at least in part, on the absolute coordinates of the target. The processing unit can be configured to process the image of the target to determine that the target was captured in the image, where processing the image includes comparing one or more features of the image with one or more features of each target in a plurality of known targets. The mobile device may include a wireless communication interface configured to receive one or more wireless signals from one or more access points, and the processing unit can be further configured to determine a proximity of the one or more access points based on wireless signals, and determine the plurality of known targets, based on the determined proximity of the one or more access points. The processing unit can be configured to incorporate a Simultaneous Localization And Mapping (SLAM) system with the EKF. The mobile device can include one or more motion sensors, and the processing unit can be further configured to determine the pose determination based, at least in part, on one or more measurements received from the one or more motion sensors. The one or more motion sensors can include one or more of an accelerometer or a gyroscope. The processing unit can be configured to determine a bias of the one or more motion sensors, based at least in part on the pose determination and the corrected pose.

An example apparatus, according to the disclosure, can include means for obtaining location information regarding a target, means for obtaining an image of the target, the image captured by a mobile device, and means for estimating, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, where the pose comprises information indicative of a translation and orientation of the mobile device. The example apparatus further includes means for correcting a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.

The example apparatus can further include one or more of the following features. The apparatus can include means for obtaining absolute coordinates of the target, where the means for correcting the pose is configured to base the corrected pose, at least in part, on the absolute coordinates of the target. The apparatus can include means for processing the image of the target to determine that the target was captured in the image, where the means for processing the image include means for comparing one or more features of the image with one or more features of each target in a plurality of known targets. The apparatus can include means for receiving one or more wireless signals from one or more access points, means for determining a proximity of the one or more access points based on wireless signals, and means for determining the plurality of known targets, based on the determined proximity of the one or more access points. The apparatus can include means for incorporating a Simultaneous Localization And Mapping (SLAM) system with the EKF. The apparatus can include means for basing the pose determination, at least in part, on measurements of one or more of an accelerometer or a gyroscope of the mobile device. The apparatus can include means for determining a bias of one or more of the accelerometer or the gyroscope of the mobile device, based at least in part on the pose determination and the corrected pose.

A example non-transitory machine-readable medium, according to the disclosure, can have instructions embedded thereon for correcting drift in a tracking system of a mobile device. The instructions include computer code for obtaining location information regarding a target, obtaining an image of the target, the image captured by the mobile device, and estimate, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, where the pose comprises information indicative of a translation and orientation of the mobile device. The instructions also include computer code for correcting a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.

The example non-transitory machine-readable medium can further include instructions including computer code for one or more of the following features. Instructions can include computer code for obtaining absolute coordinates of the target, wherein the computer code is further configured to base correcting the pose, at least in part, on the absolute coordinates of the target. Instructions can include computer code for processing the image of the target to determine that the target was captured in the image, wherein the computer code for processing includes computer code for comparing one or more features of the image with one or more features of each target in a plurality of known targets. Instructions can include computer code for receiving one or more wireless signals from one or more access points, determining a proximity of the one or more access points based on wireless signals, and determining the plurality of known targets, based on the determined proximity of the one or more access points. Instructions can include computer code for incorporating a Simultaneous Localization And Mapping (SLAM) system with the EKF. The computer code can be configured to base the pose determination, at least in part, on measurements from one or more of an accelerometer or a gyroscope of the mobile device. Instructions can include computer code for determining a bias of one or more of the accelerometer or the gyroscope of the mobile device, based at least in part on the pose determination and the corrected pose.

Items and/or techniques described herein may provide one or more of the following capabilities, as well as other capabilities not mentioned. Techniques can provide for the mitigation of drift in an indoor location tracking system, such as a Visual Inertial Tracker (VIT), providing for increased accuracy. This, in turn, can lead to a better user experience of applications and/or other features of a mobile device that are dependent on the indoor location tracking system. These and other advantages and features are described in more detail in conjunction with the text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

An understanding of the nature and advantages of various embodiments may be realized by reference to the following figures. In the appended figures, similar components or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.

FIG. 1 is simplified image that can help illustrate how a Visual Inertial Tracker (VIT) can utilize targets for pose estimation and or correction, according to an embodiment.

FIG. 2 is a block diagram of an example VIT.

FIG. 3 is a flow chart of a high-level process of drift correction, according to an embodiment, which can be executed by a VIT or other tracking system.

FIG. 4 is a flow diagram of a method of correcting drift in VIT or other tracking system, according to an embodiment.

FIG. 5 is a block diagram of an embodiment of a mobile device.

DETAILED DESCRIPTION

The detailed description set forth below in connection with the appended drawings is intended as a description of various aspects of the present disclosure and is not intended to represent the only aspects in which the present disclosure may be practiced. Each aspect described in this disclosure is provided merely as an example or illustration of the present disclosure, and should not necessarily be construed as preferred or advantageous over other aspects. The detailed description includes specific details for the purpose of providing a thorough understanding of the present disclosure. However, it will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the present disclosure. Acronyms and other descriptive terminology may be used merely for convenience and clarity and are not intended to limit the scope of the disclosure.

Mobile devices, such as mobile phones, media players, tablets, head-mounted displays (HMDs) and other wearable electronic devices, and the like, can often execute applications and/or provide features that utilize the mobile device's translation and orientation, or “pose.” Tracking the mobile device's pose in a spatial coordinate system such as the Earth-Centered, Earth-Fixed (ECEF) coordinate system can be accomplished in any of a variety of ways. Often times, this is done utilizing built-in sensors of the mobile device, such as accelerometers, gyroscopes, a Global Positioning System (GPS) receiver, and the like. A determination of the mobile device's pose can be used to enable or enhance navigation, games, and/or other applications.

Where GPS positioning is unavailable or unreliable, such as in indoor environments, the tracking of a mobile device's pose can be done by a Visual Inertial Tracker (VIT), which can combine measurements from visual tracking systems (which utilize visual sensors such as cameras) with measurements from inertial tracking systems (which utilize inertial sensors such as accelerometers and gyroscopes). Other sensors can be utilized as well, such as a barometer or altimeter to determine and/or correct altitude measurements. That said, where GPS coordinates are available, they may also be used to provide absolute location information to a VIT. One such embodiment of a VIT incorporates a Simultaneous Localization And Mapping (SLAM) system based on an Extended Kalman Filter (EKF) framework: the EKF receives various measurements from different sensors as listed above to track the pose of a phone. This type of system is referred to herein as an EKF-SLAM system. A keyframe based parallel tracking and mapping system (“PTAM”) is another example of a SLAM system.

Despite using both visual and inertial measurements, drift can still be a problem for such VITs. In other words, inaccuracies in these inputs can accumulate over time. The drift of current state-of-the-art VIT systems is about 1% over distance. So, for example, a VIT executed on a mobile device held buy a walking user will drift 1 meter for every 100 meters the user walks.

Embodiments are described herein that can implement pose or provide drift correction in VITs (such as the EKF-SLAM system previously described) by providing a pose measurement derived from an image of a target with a known position. This pose measurement can be used to correct drift. Moreover, such correction can take place each time a VIT captures an image of a known target.

FIG. 1 is simplified image that can help illustrate how a VIT can utilize such targets for pose correction, according to one embodiment. In this embodiment, a user can use an application executed on a mobile device 120 to track the user's position in a store. Depending on the functionality of the application, it may display information on a display 130 of the mobile device 120, such as the user's position on a map of the store and/or where certain items may be located.

The application may track the user's position from information obtained by a VIT, which is also executed by the mobile device 120. As part of the tracking process, the camera may capture an image 100 (e.g., a video frame) of a certain portion of the store that may have one or more targets 110. Targets 110 can be images or objects with known locations, shown in FIG. 1 as aisle signs (targets 110-1 and 100-2 at the end of aisles 1 and 2, respectively). The targets 110 include one or more keypoints recognizable by the detection algorithm that uses the keypoints to provide the pose measurement to the VIT. When one or more targets enter the camera view, an accurate pose can be determined as a result of the known location(s) of the keypoints on the target(s) 110. This can happen in 4 steps: keypoint detection, keypoint matching, outlier rejection, pose estimation based on minimization of reprojection error. Example implementations are described in “Real-Time Detection and Tracking for Augmented Reality on Mobile Phones” by Daniel Wagner et al in IEEE Transactions on Visualization and Computer Graphics, 2009, found at http://dl.acm.org/citation.cfm?id-1605359, which is incorporated by reference herein. The pose in the VIT can then be replaced or updated by the pose obtained from the keypoints on the target to reduce drift. (Alternatively, the pose may be initialized the VIT if this is the first absolute pose measurement.)

As an example, if the image 100 reveals that the pose obtained from the keypoints on the target 110-1 corresponds to the pose determined by the VIT when the image 100 was taken, then no drift correction is needed. However, if there is a mismatch between the pose obtained from the keypoints on the target 110-1 and the pose determined by the VIT, then drift correction is needed.

In some embodiments, drift correction can include resetting the VIT's pose by replacing the pose previously determined by the VIT with the pose obtained using the keypoints on the target 110-1. Moreover, some embodiments may avoid the additional processing requirements it would take to frequently run keypoint detection by analyzing the VIT pose and VIT pose uncertainty and executing keypoint detection only if keypoints on a target are predicted to be in view.

Of course, different embodiments may choose to perform such drift correction differently, depending on desired functionality, processing considerations, and other factors. For example, a pose calculation can be calculated for each image in which a target appears, once for a given period of time and/or set of video frames, once for each series of video frames in which a target appears, and the like. A person of ordinary skill in the art will recognize many additional variations.

FIG. 2 is a block diagram of an example VIT 200. This VIT 200 employs an EKF-SLAM topology that utilizes a computer vision (CV) component 210 and an EKF component 220. These components can be executed in hardware and/or software of a mobile device 120, an example of which is described in further detail below with regard to FIG. 5.

As illustrated in FIG. 2, the CV component 210 can receive images, or camera frames, from a mobile device, where the camera frames having accurate time stamps. This can enable the CV component 210 to determine when an image is captured, which can be combined or fused in the EKF with time stamped inertial sensor measurements from the same mobile device. Depending on desired functionality, the camera frames can be captured as a series of still images and/or as part of video. Embodiments utilizing video capture can, for example, receive images at 30 frames per second. Other embodiments may utilize other frame rates. In some embodiments, only a portion of frames captured by a camera may be provided to and/or utilized by the CV component. Some embodiments may have CV components that utilize all frames.

The CV component can also receive camera calibration information. Intrinsic camera calibration includes, for example, principal point, focal length at infinity, and radial distortion. Extrinsic camera calibration parameters include rotation and translation with respect to inertial sensor chip. Rotation can be estimated in the VIT or assumed to be lined up with the camera. All other camera calibration parameters are very similar for a mobile device of one type. Hence, obtaining them, for example, from a certain model of mobile phone allows the calibration parameters to be applied to all mobile phones of that certain model.

Using the camera frames and camera calibration, the CV component 210 can employ any of a variety of algorithms to implement keypoint detection and keypoint tracking on the received camera frames. Keypoint detection can be based on Harris corners. Keypoint tracking can be based on Normalized Cross-Correlation (NCC). The keypoint detection/tracking provides 2-D keypoint measurements in the camera frame that are relayed to the EKF component 220. The EKF component 220, utilizing sensor measurements (as detailed below) can calculate and share the predicted 2D camera coordinates of keypoints with the CV component 210 to limit the search space of the image point finder, increasing efficiency of the process. The 2D camera coordinates of keypoints measured by the image point finder and provided to the EKF component 220 are ultimately used to estimate the pose of the mobile device 120.

In addition to the VIT pose, the EKF component 220 utilizes the 2-D keypoint measurements from the CV component 210, together with sensor measurements from a gyroscope (“gyro meas” in FIG. 2), accelerometer (“accel meas”), and the like, to jointly estimate the three-dimensional (3D) position of the keypoints. M, biases on accelerometers and gyroscopes, and the gravity vector. For more information on an EKF-SLAM implementation, see Jones, Eagle S., and Stefano Soatto, “Visual-inertial navigation, mapping and localization: A scalable real-time causal approach.” The International Journal of Robotics Research 30.4 (2011): 407-430, which is incorporated by reference herein in its entirety.

GPS measurements (“GPS meas”) can also be provided to the EKF component to provide an absolute coordinate framework in which the mobile device 120 may be tracked. For example, a mobile device 120 initially may be in a location that can receive GPS measurements, and may therefore determine absolute location coordinates for the mobile device 120. As the mobile device moves to a location in which GPS measurements are not received (e.g., indoors), the VIT 200 may determine absolute coordinates of the mobile device 120 based on the mobile device's movement relative to a position in which absolute coordinates were determined based on GPS information.

Embodiments can further provide, as an input to the EKF component 220, a pose measurement derived from an image of a target. As described above, the pose measurement can come from a keypoint detector, which can not only detect keypoints of a target in an image but also determine the pose of the mobile device 120 based on the image. In some embodiments, the pose measurement may be provided from the CV component 210 and/or may be derived from 2D camera coordinates. Other embodiments may provide additional and/or alternative components to provide target detection and/or pose measurement.

Details regarding the targets can be locally stored and/or accessible by the VIT 200. Details can include location information such as absolute location of the target and/or its keypoints for pose calculation based on an image of the target. Additionally or alternatively, the details can include information regarding how the target may be identified.

By being able to determine pose in relation to the target and by knowing the absolute coordinates of the target (and/or one or more keypoints of the target), the absolute pose can be determined and used by the EKF component 220 to correct for any drift that might have taken place. Correction can include, for example, overriding a pose calculated by the EKF component 220 with the newly-determined pose measurement.

Depending on desired functionality, the EKF component 220 can output various types of data. As indicated in FIG. 2, for example, the EKF component 220 can output bias of an accelerometer, gyroscope, and/or other sensor (“accel bias gyro bias”), the determined pose of the mobile device (“pose of the phone”), 3D locations of keypoints (“3-D locations of keypoints”), and/or an estimation of the gravity vector (“gravity”). Any or all of these outputs may be influenced by the pose measurement determined from a detected target in a camera frame. The EKF component 220 seeks to minimize innovations between predicted and measured 2-D camera keypoints and can adjust inertial sensor biases, pose, gravity vector, and location of keypoints in 3-D to that end.

The creation of keypoints on targets for pose correction in a VIT can be done in any manner of different ways, for any manner of different applications. For instance, if a picture of the target is taken from the fronto-parallel view, keypoints can be determined using the Fast Corner algorithm, scale can be provided by measuring the distance between two of the keypoints, and descriptors can be obtained from the pixel measurements in the vicinity of the respective keypoints. The placement and designation of targets for a venue can vary, depending on desired functionality. Broadly speaking, the more targets that are located in and distributed throughout a venue, the more drift correction they can provide to a VIT, providing more accurate pose determination. The designation of targets and the creation of a venue map can be facilitated through an application on a mobile device. Additionally or alternatively, the data associated with the map—such as location information regarding the targets utilized to obtain pose measurements using the techniques described herein—can be collected with the designation of targets and incorporated into the map.

FIG. 3 is a flow chart of a high-level process of drift correction, according to one embodiment, which can be executed by a VIT or other tracking system. More specifically, means for performing one or more of the illustrated components can include hardware and/or software means described in further detail in relation to FIG. 5, which may be logically separated into separate components, such as the components described in FIG. 2. Some or all of the components can be executed by hardware and/or software at an operating system or device level. Other embodiments may include alterations to the embodiments shown. Components shown in FIG. 3 may be performed in a different order and/or simultaneously, according to different embodiments. Moreover, a person of ordinary skill in the art will recognize many additions, omissions, and/or other variations.

The process can start by receiving a camera image at block 310. The type of camera image can vary in resolution, color, and/or other characteristics, depending on desired functionality, camera hardware, and/or other factors. Moreover, the camera image may be a discrete still image or may be one of several frames of video captured by the camera. In some embodiments, the image may be processed to a degree before it is received by a VIT, to facilitate further image processing by the VIT.

At block 320, the VIT optionally receives WiFi signals, which can facilitate the determination of which targets may be included in the received image. For example, wireless signals may be utilized together with a map of a venue that includes the identity and locations of WiFi access points. If the locations of certain access points can be determined from the map, the VIT can get a rough estimate of where in the venue the VIT (and any mobile device associated therewith) is. The VIT can do this by measuring WiFi signals received from the WiFi access points by the mobile device (e.g., measuring received signal strength (RSSI), round-trip time (RTT), and/or other measurements) to determine a proximity of the access points—including which access points may be closest. This can then be compared with the map to determine a region in the venue in which the mobile device is located.

The VIT of the mobile device can then reduce processing loads related to target detection by determining nearby targets based on the WiFi signals, at block 330, and reducing the targets to detect to the nearby targets. Such optional functionality can be beneficial, for instance, when a customer starts an application that uses the map only after having entered the venue. With no location initially, the VIT can benefit from detecting rough location from WiFi signals. Furthermore, tight integration of GPS with VIT would allow for an initial position estimate using GPS measurements (code, Doppler, carrier phase) in addition to the regular VIT measurements listed above.

It will be understood that, although WiFi signals are described in the embodiment shown in FIG. 3, additional or alternative wireless signals may be utilized.

At block 340, the image is processed to detect targets. As previously described, targets can be images or objects with known locations. Thus, the VIT can utilize a detection algorithm in which the image is processed to determine whether certain keypoints of the image match with keypoints of known targets by, for example, comparing the keypoints of the image with keypoints of one or more known targets (e.g., targets having keypoints and location information stored for comparison). Depending on algorithm(s) used and implement a variety of detection and matching techniques, from simple edge detection to the recognition of more complex patterns, symbols, and more.

At block 350, a determination is made of whether a target is in the image. Techniques for making the determination can vary based on the detection algorithm(s) involved. In some embodiments, detection algorithms can determine whether a target is in the image by determining whether one or more keypoints in the image match with one or more corresponding keypoints of a known target to a degree above a threshold level of certainty.

If a target is not determined to be in an image, the process ends (potentially restarting with the receipt of a new image). However, if a target is determined to be in the image, a pose is determined based on the camera image, at block 360. As explained above with regard to FIG. 2, pose determination can utilize any of a variety of techniques to determine pose based on the known location of the target in the image, as well as information obtained from and/or associated with the image itself. For example, VIT can determine a distance and orientation of the target in relation to the mobile device (e.g., by analyzing characteristics of detected keypoints in the image, such as location, spacing, etc.), and use this information, together with the known location (and orientation) of the target, to determine a pose of the mobile device.

At block 370, the pose from the target can be provided to an EKF of the VIT in the manner described previously, allowing the VIT to correct (e.g., adjust or replace) a posed determination (which may have been previously and/or separately determined from visual and/or inertial sensor input), based on the pose provided in the process of FIG. 3.

FIG. 4 is a flow diagram of another, more generalized method 400 of correcting drift in VIT or other tracking system, according to one embodiment. Means for performing one or more of the components of the method 400 can include hardware and/or software means described in further detail in relation to FIG. 5, which may be logically separated into different components, such as the components described in FIG. 2. The method 400, and other techniques described herein, can be executed by hardware and/or software at an operating system or device level. Alternative embodiments may include alterations to the embodiments shown. Components of the method 400, although illustrated in a particular order, may be performed in a different order and/or simultaneously, according to different embodiments. Moreover, a person of ordinary skill in the art will recognize many additions, omissions, and/or other variations.

At block 410, location information regarding a target is obtained. As indicated previously, location information regarding a target can include keypoints associated with coordinates in a spatial coordinate system. Corresponding descriptors of one or more targets can be associated with a map, such as a map of a location in which a VIT system is used to track a mobile device's pose. In some embodiments, this information (as well as information for other targets of a venue) can be stored on a server of the venue, and transferred to a mobile device (e.g., wirelessly via WiFi using an application executed by the mobile device) when the mobile device enters or approaches the venue.

At block 420 image of the target is captured by the mobile device. The image can be captured as part of a VIT tracking process, and may be one of a series of video frames. As previously described, the image may be processed to extract keypoints from the image and use one or more detection algorithms to determine whether the target is in the image. For example, algorithms may include comparing one or more keypoints extracted from the image with one or more keypoints of each target in a plurality of known targets of a venue.

At block 430 measurements relating to a pose of the mobile device are estimated using the keypoints positioned on the targets. As previously indicated, positioned keypoints can be used to reveal the pose of the mobile device in a spatial coordinate system.

At block 440, a pose determination of the tracking system is corrected using an EKF, based on the measurements relating to the pose of the mobile device. As described above, the tracking system can use visual and inertial information to make the pose determination of the mobile device, which can be used in various applications, such as indoor navigation, augmented reality, and more. Because the pose determination is subject to drift, it can be corrected (e.g., modified or replaced) by providing measurements estimated from the image to an EKF. For example, a pose measurement can be obtained from a target using the process of keypoint detection, keypoint matching, outlier rejection, and pose estimation as described above. The pose measurement can then be provided to an EKF component to correct the pose of the mobile device. In alternative embodiments, a correction to a pose may not involve an EKF.

Depending on desired functionality, different embodiments may implement variations on the method 400 of correcting drift in VIT illustrated in FIG. 4. For example, in one implementation, a customer may be able to simply to point the phone at a target to get pose to get a map and his or her position on the map without any prior and/or subsequent tracking. That is, the pose obtained from a target may provide an initial pose to a VIT in addition or as an alternative to replacing a pose previously determined by the VIT. In some embodiments, a VIT may further obtain absolute coordinates of the target, enabling the VIT to correct a determined pose of the mobile device using the keypoints of the target and absolute coordinates associated therewith. A person of ordinary skill in the art will recognize many additional variations.

FIG. 5 is a block diagram of an embodiment of a mobile device 120, which can implement the techniques for correcting a pose determination of the tracking system, such as the method 400 shown in FIG. 4. It should be noted that FIG. 5 is meant only to provide a generalized illustration of various components, any or all of which may be utilized as appropriate. Moreover, system elements may be implemented in a relatively separated or relatively more integrated manner. Additionally or alternatively, some or all of the components shown in FIG. 5 can be utilized in another computing device, which can be used in conjunction with a mobile device 120 as previously described.

The mobile device 120 is shown comprising hardware elements that can be electrically coupled via a bus 505 (or may otherwise be in communication, as appropriate). The hardware elements may include a processing unit 510 which can include without limitation one or more general-purpose processors, one or more special-purpose processors (such as digital signal processors (DSPs), graphics acceleration processors, application specific integrated circuits (ASICs), and/or the like), and/or other processing structure or means, which can be configured to perform one or more of the methods described herein, including methods illustrated in FIGS. 3-4. As shown in FIG. 5, some embodiments may have a separate DSP 520, depending on desired functionality. The mobile device 120 also can include one or more input devices 570, which can include without limitation one or more camera(s), a touch screen, a touch pad, microphone, button(s), dial(s), switch(es), and/or the like; and one or more output devices 515, which can include without limitation a display, light emitting diode (LED), speakers, and/or the like.

The mobile device 120 might also include a wireless communication interface 530, which can include without limitation a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset (such as a Bluetooth™ device, an IEEE 502.11 device, an IEEE 502.15.4 device, a WiFi device, a WiMax device, cellular communication facilities, etc.), and/or the like. The wireless communication interface 530 may permit data to be exchanged with a network, wireless access points, other computer systems, and/or any other electronic devices described herein. The communication can be carried out via one or more wireless communication antenna(s) 532 that send and/or receive wireless signals 534.

Depending on desired functionality, the wireless communication interface 530 can include separate transceivers to communicate with base transceiver stations (e.g., base transceiver stations of a cellular network) and access points. These different data networks can include, an OFDMA and/or other type of network.

The mobile device 120 can further include sensor(s) 540, as previously described. Such sensors can include, without limitation, one or more accelerometer(s), gyroscope(s), camera(s), magnetometer(s), altimeter(s), microphone(s), proximity sensor(s), light sensor(s), and the like. At least a subset of the sensor(s) 540 can provide camera frames and/or inertial information used by a VIT for tracking.

Embodiments of the mobile device may also include a Satellite Positioning System (SPS) receiver 580 capable of receiving signals 584 from one or more SPS satellites using an SPS antenna 582. Such positioning can be utilized to complement and/or be incorporated in the techniques described herein. It can be noted that, as used herein, an SPS may include any combination of one or more global and/or regional navigation satellite systems and/or augmentation systems, and SPS signals may include SPS, SPS-like, and/or other signals associated with such one or more SPS. GPS is an example of an SPS.

The mobile device 120 may further include and/or be in communication with a memory 560. The memory 560 can include, without limitation, local and/or network accessible storage, a disk drive, a drive array, an optical storage device, a solid-state storage device, such as a random access memory (“RAM”), and/or a read-only memory (“ROM”), which can be programmable, flash-updateable, and/or the like. Such storage devices may be configured to implement any appropriate data structures, such as the FIFO and/or other memory utilized by the techniques described herein, and may be allocated by hardware and/or software elements of an OFDM receiver. Additionally or alternatively, data structures described herein can be implemented by a cache or other local memory of a DSP 520 or processing unit 510. Memory can further be used to store an image stack, inertial sensor data, and/or other information described herein.

The memory 560 of the mobile device 120 also can comprise software elements (not shown), including an operating system, device drivers, executable libraries, and/or other code, such as one or more application programs, which may comprise computer programs provided by various embodiments, and/or may be designed to implement methods, and/or configure systems, provided by other embodiments, as described herein. Merely by way of example, one or more procedures described with respect to the method(s) discussed above, such as the methods illustrated in FIGS. 3-4, might be implemented as code and/or instructions executable by the mobile device 120 (and/or processing unit 510 within a mobile device 120) and/or stored on a non-transitory and/or machine-readable storage medium (e.g., a “computer-readable storage medium,” a “machine-readable storage medium,” etc.). In an aspect, then, such code and/or instructions can be used to configure and/or adapt a general purpose processor (or other device) to perform one or more operations in accordance with the described methods.

It will be apparent to those skilled in the art that substantial variations may be made in accordance with specific requirements. For example, customized hardware might also be used, and/or particular elements might be implemented in hardware, software (including portable software, such as applets, etc.), or both. Further, connection to other computing devices such as network input/output devices may be employed.

The methods, systems, and devices discussed above are examples. Various configurations may omit, substitute, or add various procedures or components as appropriate. For instance, in alternative configurations, the methods may be performed in an order different from that described, and/or various stages may be added, omitted, and/or combined. Also, features described with respect to certain configurations may be combined in various other configurations. Different aspects and elements of the configurations may be combined in a similar manner. Also, technology evolves and, thus, many of the elements are examples and do not limit the scope of the disclosure or claims.

The term Computer Vision (CV) application as used herein refers to a class of applications related to the acquisition, processing, analyzing, and understanding of images. CV applications include, without limitation, mapping, modeling—including 3D modeling, navigation, augmented reality applications, and various other applications where images acquired from an image sensor are processed to build maps, models, and/or to derive/represent structural information about the environment from the captured images. In many CV applications, geometric information related to captured images may be used to build a map, model, and/or other representation of objects and/or other features in a physical environment.

It can be further noted that, although examples described herein are implemented by a mobile device, embodiments are not so limited. Embodiments can include, for example, personal computers and/or other electronics not generally considered “mobile.” A person of ordinary skill in the art will recognize many alterations to the described embodiments.

Terms, “and” and “or” as used herein, may include a variety of meanings that also is expected to depend at least in part upon the context in which such terms are used. Typically, “or” if used to associate a list, such as A, B, or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B, or C, here used in the exclusive sense. In addition, the term “one or more” as used herein may be used to describe any feature, structure, or characteristic in the singular or may be used to describe some combination of features, structures, or characteristics. However, it should be noted that this is merely an illustrative example and claimed subject matter is not limited to this example. Furthermore, the term “at least one of” if used to associate a list, such as A, B, or C, can be interpreted to mean any combination of A, B, and/or C, such as A, AB, AA, AAB, AABBCCC, etc.

Having described several example configurations, various modifications, alternative constructions, and equivalents may be used without departing from the spirit of the disclosure. For example, the above elements may be components of a larger system, wherein other rules may take precedence over or otherwise modify the application of embodiments. Also, a number of steps may be undertaken before, during, or after the above elements are considered. Accordingly, the above description does not bound the scope of the claims. 

What is claimed is:
 1. A method of correcting drift in a tracking system of a mobile device, the method comprising: obtaining location information regarding a target; obtaining an image of the target, the image captured by the mobile device; estimating, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, wherein the pose comprises information indicative of a translation and orientation of the mobile device; and correcting a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.
 2. The method of claim 1, further comprising obtaining absolute coordinates of the target, wherein correcting the pose is further based, at least in part, on the absolute coordinates of the target.
 3. The method of claim 1, further comprising processing the image of the target to determine that the target was captured in the image, wherein the processing includes comparing one or more keypoints of the image with one or more keypoints of each target in a plurality of known targets.
 4. The method of claim 3, further comprising: receiving one or more wireless signals from one or more access points; determining a proximity of the one or more access points based on wireless signals; and determining the plurality of known targets, based on the determined proximity of the one or more access points.
 5. The method of claim 1, wherein the tracking system incorporates a Simultaneous Localization And Mapping (SLAM) system with the EKF.
 6. The method of claim 1, wherein the pose determination is based, at least in part, on measurements from one or more of an accelerometer or a gyroscope of the mobile device.
 7. The method of claim 6, further comprising determining a bias of one or more of the accelerometer or the gyroscope of the mobile device, based at least in part on the pose determination and the corrected pose.
 8. A mobile device comprising: a camera; a memory; and a processing unit operatively coupled with the camera and the memory and configured to: obtain location information regarding a target; obtain, an image of the target, the image captured by the camera of the mobile device; estimate, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, wherein the pose comprises information indicative of a translation and orientation of the mobile device; and correct a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.
 9. The mobile device of claim 8, wherein the processing unit is configured to obtain absolute coordinates of the target, wherein the processing unit is further configured to correct the pose is based, at least in part, on the absolute coordinates of the target.
 10. The mobile device of claim 8, wherein the processing unit is configured to process the image of the target to determine that the target was captured in the image, and wherein processing the image includes comparing one or more features of the image with one or more features of each target in a plurality of known targets.
 11. The mobile device of claim 10, further comprising a wireless communication interface configured to receive one or more wireless signals from one or more access points, wherein the processing unit is further configured to: determine a proximity of the one or more access points based on wireless signals; and determine the plurality of known targets, based on the determined proximity of the one or more access points.
 12. The mobile device of claim 8, wherein the processing unit is configured to incorporate a Simultaneous Localization And Mapping (SLAM) system with the EKF.
 13. The mobile device of claim 8, further comprising one or more motion sensors, wherein the processing unit is further configured to determine the pose determination based, at least in part, on one or more measurements received from the one or more motion sensors.
 14. The mobile device of claim 13, wherein the one or more motion sensors include one or more of an accelerometer or a gyroscope.
 15. The mobile device of claim 13, wherein the processing unit is configured to determine a bias of the one or more motion sensors, based at least in part on the pose determination and the corrected pose.
 16. An apparatus comprising: means for obtaining location information regarding a target; means for obtaining an image of the target, the image captured by a mobile device; means for estimating, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, wherein the pose comprises information indicative of a translation and orientation of the mobile device; and means for correcting a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.
 17. The apparatus of claim 16, further comprising means for obtaining absolute coordinates of the target, wherein the means for correcting the pose is configured to base the corrected pose, at least in part, on the absolute coordinates of the target.
 18. The apparatus of claim 16, further comprising means for processing the image of the target to determine that the target was captured in the image, wherein the means for processing the image include means for comparing one or more features of the image with one or more features of each target in a plurality of known targets.
 19. The apparatus of claim 18, further comprising: means for receiving one or more wireless signals from one or more access points; means for determining a proximity of the one or more access points based on wireless signals; and means for determining the plurality of known targets, based on the determined proximity of the one or more access points.
 20. The apparatus of claim 16, further comprising means for incorporating a Simultaneous Localization And Mapping (SLAM) system with the EKF.
 21. The apparatus of claim 16, further comprising means for basing the pose determination, at least in part, on measurements of one or more of an accelerometer or a gyroscope of the mobile device.
 22. The apparatus of claim 21, further comprising means for determining a bias of one or more of the accelerometer or the gyroscope of the mobile device, based at least in part on the pose determination and the corrected pose.
 23. A non-transitory machine-readable medium having instructions embedded thereon for correcting drift in a tracking system of a mobile device, the instructions including computer code for: obtaining location information regarding a target; obtaining an image of the target, the image captured by the mobile device; estimating, from the image of the target, measurements relating to a pose of the mobile device based on the image and location information, wherein the pose comprises information indicative of a translation and orientation of the mobile device; and correcting a pose determination of the mobile device using an Extended Kalman Filter (EKF), based, at least in part, on the measurements relating to the pose of the mobile device.
 24. The non-transitory machine-readable medium of claim 23, the instructions further including computer code for obtaining absolute coordinates of the target, wherein the computer code is further configured to base correcting the pose, at least in part, on the absolute coordinates of the target.
 25. The non-transitory machine-readable medium of claim 23, the instructions further including computer code for processing the image of the target to determine that the target was captured in the image, wherein the computer code for processing includes computer code for comparing one or more features of the image with one or more features of each target in a plurality of known targets.
 26. The non-transitory machine-readable medium of claim 25, the instructions further including computer code for: receiving one or more wireless signals from one or more access points; determining a proximity of the one or more access points based on wireless signals; and determining the plurality of known targets, based on the determined proximity of the one or more access points.
 27. The non-transitory machine-readable medium of claim 23, the instructions further including computer code for incorporating a Simultaneous Localization And Mapping (SLAM) system with the EKF.
 28. The non-transitory machine-readable medium of claim 23, wherein the computer code can be configured to base the pose determination, at least in part, on measurements from one or more of an accelerometer or a gyroscope of the mobile device.
 29. The non-transitory machine-readable medium of claim 28, the instructions further including computer code for determining a bias of one or more of the accelerometer or the gyroscope of the mobile device, based at least in part on the pose determination and the corrected pose. 